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Abstract 

This paper studies a discrepancy-sensitive approach to dynamic 
fractional cascading. We show, for example, that a search for a value 
a; in a collection of catalogs, of size at most n, stored in vertices of 
a path P can be done in time Oflogn + J2( v w)eP ^°E^v,w(x)), where 
d v ,w(x) is the relative local discrepancy at x of the catalogs stored at 
the nodes v and w in G. Such an approach is useful in real-world 
scenarios, for it leads to faster query and update times in many cases. 
We provide an efficient data structure for dominated maxima searching 
in a dynamic set of points in the plane, which in turn leads to an 
efficient dynamic data structure that can answer queries for nearest 
neighbors using any Minkowski metric. Specific bounds are derived for 
uniformly distributed data, and we also provide experimental results 
that show this discrepancy-sensitive approach works well in practice. 

Keywords: discrepancy, fractional cascading, dynamic data struc- 
tures, nearest neighbors, Minkowski metrics. 

1 Introduction 

Discrepancy theory deals with the degrees to which point sets differ from 
their expected uniformity (e.g., see Chazelle [TO]). This theory is usually 
applied globally, for entire sets, but we are interested in local notions of 
discrepancy, dealing with how sets differ from their expected uniformity in 
small intervals. This interest is motivated from dynamic fractional cascad- 
ing [mini ee]. 
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In fractional cascading [TTJ [12], we are given a bounded-degreeEl catalog 
graph G, such that each vertex v of G stores a catalog C{v) C U, for a 
total order U. The catalogs stored at nodes in G are assumed to be stored 
sorted according to the total order U (and we note that this assumption 
can be relaxed (for example, for sets of disjoint line segments), so that we 
assume that the C(v)'s contained in any possible search path all belong to 
a common total order. That is, the partial order defined by all of the C(t»)'s 
has a common linearization. Intuitively, a catalog graph G is a data structure 
and the paths in G are potential traversals in G that might be needed in 
order to answer a given query. Given a value x belonging to the total order 
for a path P in G, a query for x in P searches for x in the catalog C(v) for 
each vertex v in P. If insertions and deletions are allowed in the C(u)'s, then 
we have the "dynamic fractional cascading" [18] problem. Static fractional 
cascading solutions due to Chazelle and Guibas [T2] allow for queries to 
be performed in a path of length k in time 0(log n + k), where n is the total 
size of all the catalogs, and dynamic fractional cascading solutions due to 
Mehlhorn and Naher [18] show that such queries can be done in a dynamic 
setting in O (log n + k log log n) time, with updates taking O (log n log log | U\ ) 
amortized time. Thus, achieving a dynamic data fractional cascading data 
structure has an increased complexity. Moreover, this complexity seems 
inherently difficult to eliminate in the worst case, as it is based on the 
use of fairly sophisticated data structures that seem necessary in order to 
handle updates to adjacent catalogs that are very different from one another. 
The reduced efficiency of dynamic fractional cascading seems to come from 
its need to dynamically handle discrepancy. Our interest in this paper, 
therefore, is to address discrepancy head on — to design a scheme for dynamic 
fractional cascading that is discrepancy sensitive. The motivation for such 
an approach is that there are a number of applications, motivated by real- 
world scenarios, where the discrepancies between adjacent catalogs are not 
that great. 

1.1 Real- World Nearest Neighbor Queries 

Suppose, for example, that in a given region, such as a downtown area or 
university campus, there are sensors that keep track of different physical 
entities, such as police kiosks, doctors' offices, or coffee shops. When the 
services of one of these entities are suddenly and urgently needed — e.g., a 
robbery is in progress, someone is having a heart attack, or a paper deadline 

*We note that a catalog graph of degree d > 3 can be transformed into a degree-3 
catalog graph by replacing high-degree nodes with complete binary trees. 
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is looming — there is a need to quickly compute the position of the entity 
closest to the location of the sudden emergency event. The important ob- 
servation we make for such real-world scenarios is that service centers, such 
as these, are naturally distributed in a fairly uniform way. So if we were to 
partition our space according to some reasonable data structuring scheme, 
and apply fractional cascading, we should expect adjacent catalogs to be 
fairly similar. 

This paper therefore deals with the questions of how to make such notions 
of similarity precise and to design structures that store and dynamically 
update the positions of the entities so that the above-mentioned nearest- 
neighbor queries can be efficiently processed. The data structures we seek 
should handle arbitrary insertions and deletions. This is necessary in order 
to model situations where an entity can suddenly become unavailable or 
available: for instance, a doctor who is no longer on call or has returned 
from caring for a patient, a police officer who is busy handling a robbery, or 
a coffee shop that has run out of espresso. More formally, our motivating 
application can be stated as follows: Given a dynamic set S of n points in the 
plane of real coordinates x and y, we seek to maintain a data structure that 
is (i) space-efficient and (ii) supports fast updates (insertions and deletions), 
as well as fast exact nearest-neighbor (NN) queries. The query points are 
not necessarily in S, but we allow our structure to run faster when S can 
be partitioned into subsets with low relative local discrepancy. 

Previous Related Work. There is a considerable amount of prior related 
work on discrepancy theory and fractional cascading data structures. For 
prior results in discrepancy theory, for example, please see the excellent book 
by Chazelle [10] • Subsequent to the introduction of fractional cascading by 
Chazelle and Guibas [11[ [T2] and its dynamic implementation by Mehlhorn 
and Naher [18], there have been many specific uses for this technique, as 
well as a generalization, due to Sen [23], based on randomized skip lists, and 
an extension for I/O efficiency due to Yap and Zhu |26j . 

The prior work on nearest neighbor structures is vast; for more detailed 
reviews, see the surveys by Alt [1] or Clarkson [13] . Indeed, let us focus here 
on prior work for planar point sets. For static data, there are several ways 
to achieve O(logra) time for nearest-neighbor queries in the plane, including 
constructing a planar point location data structure "on top" of a Voronoi 
diagram (e.g., see |22j). For uniformly distributed data, Bentley, Weide, 
and Yao [6] give optimal algorithms for static data, and Bentley [4] gives 
an optimal algorithm for the semidynamic (deletion only) case. We are 
not familiar with any previous optimal fully dynamic algorithms for exact 
nearest-neighbor queries in uniformly distributed data. For approximate 
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nearest-neighbor queries, Arya et al. [3] give an optimal static structure, 
and Eppstein et al. [15] give an optimal dynamic structure. Finally, for 
general exact nearest-neighbor queries, Chan [8] gives a dynamic method 
that achieves poly logarithmic expected times for updates and queries. In 
addition, there has been some work on nearest-neighbors in non-Euclidean 
settings for "reasonably separated" uniform point sets (e.g., see [7] \T7\ [16]). 
but this does work does not apply efficiently to Euclidean metrics on point 
sets taken from continuous uniform distributions. 

Our Results. In this paper, we introduce a study of a discrepancy- 
sensitive approach to dynamic fractional cascading. Unlike the Mehlhorn- 
Naher approach, which assumes a worst-case distribution for the discrepan- 
cies between adjacent catalogs, our approach is sensitive to these differences. 
That is, it runs faster through low-discrepancy neighbors and slower through 
high-discrepancy neighbors. We show, for example, that a search for a value 
x in a collection of catalogs, of size at most n, stored in vertices of a path 
P can be done in time 0(logn + X)(«,io)eP ^°S^v,w(x)), where 8 ViW (x) is the 
relative local discrepancy at x of the catalogs stored at the nodes v and 
w in G. Such a discrepancy-sensitive result is useful in a number of real- 
world scenarios, as we show that there are several practical distributions 
such that the sum of the relative local discrepancies in the catalogs belong- 
ing to a path of length k is 0(k) with high probability. For example, we use 
this approach to provide an efficient data structure for dominated maxima 
searching in a dynamic set of uniformly distributed points in the plane. This, 
together with the known fact that the expected number of maxima points in 
an uniformly distributed set S of n points in R 2 is O(logn), shows that we 
can construct a dynamic data structure that can answer queries for nearest 
neighbors in S using any Minkowski metric, where insertions and deletions 
run in 0(log 2 n) expected time and queries run in O(logn) expected time, 
as well. These expectations assume a uniform distribution, but even with 
real-life (not uniformly distributed) data we experimentally observe it to 
hold. 

2 Discrepancy-Sensitive Dynamic Fractional Cas- 
cading 

As mentioned above, we are interested in this paper in an approach to 
dynamic fractional cascading that is based on a local notion of discrepancy 
in catalog graphs. 

Relative Local Discrepancy in Catalog Graphs. Weisstein [25] defines a 



4 



notion for local discrepancy, which, for an interval /, gives a measure of 
how much the number of points intersecting / differs from the normalized 
length of /. We are, however, interested in the application to dynamic 
fractional cascading, which involves comparing adjacent catalogs to each 
other, not arbitrary intervals to catalogs. Suppose, therefore, that (v, w) is 
an edge in G and that C(v) and C(w) are the catalogs stored respectively 
at the vertices v and w in G. Let us assume, without loss of generality, 
that C(v) and C{w) both store sentinel values, "— oo" and "+oo," which 
are respectively the smallest and the largest elements in the common total 
order to which all catalog elements belong. For any value x, and vertex v 
in G, let pred v (x) denote the predecessor of x in C(v), that is, the largest 
element in C(v) less than or equal to x. Likewise, let succ^(x) denote the 
successor of x in C(v), that is, the smallest element in C(v ) greater than or 
equal to x. For any edge (v, w) in G, we define the relative local discrepancy 
from C(v) to C(w) at follows: 

S v , w (x) = | [a, b] n C(v) | + | [a, b]nC(w)\, 

where a = min{pred 1 ,(x), pred^(x)} and b = max{succt,(x), succ„,(2;)}, i.e.,, 
the relative local discrepancy from C(v) to C(w) at x is the number of items 
of C{v) and C(w) falling in the closed interval [a, b] = [pred^(x), succ„(x)] U 
\pred w (x),succ w (x)]. It is a measure of how different C(v) and C(w) are in 
the vicinity of x. Note that S V}W (x) > 2, even if C(v) = C(w). 

Augmenting a Catalog Graph to Support Searches and Updates. The 
main idea of fractional cascading |114 [12] is to augment a catalog graph G 
with auxiliary structures that support efficient searches and, in the dynamic 
case |18j . updates. The name "fractional cascading" comes from the fact that 
an effective way to perform this augmentation is to merge fractional samples 
from the catalogs. Our approach continues this tradition, but implements 
it in a more localized way. 

Let us first give some intuition about our augmentation. Imagine that we 
have a deterministic skip list [21] built "on top" of the elements in C(v) and 
that the nodes in this structure are all colored black. Likewise, imagine that 
we have a deterministic skip list built "on top" of the elements in C(w) and 
that the nodes in this structure are all colored white. These structures allow 
for both top-down and bottom-up searches and updates to be performed in 
O(logn) time |21j . Now imagine further that we merge these two structures 
into a common structure by having each black node "cut" any white edge 
(i.e., interval of white nodes) that it is contained in and having each white 
node "cut" any black edge that it is contained in. Let us then link the roots 
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of all the remaining bottom-level skip lists. The remaining structure is the 
"fractionally-cascaded" merge of C(v) and C(w) and this is the structure 
that we will maintain dynamically. 

More formally, our structure is defined so that we maintain the following 
substructures for each edge (v, w) in G (see Fig. [T]): 

• We maintain in a "black" deterministic skip list each maximal con- 
tiguous interval of C(y) that contains no elements of C(w). 

• We maintain in a "white" deterministic skip list each maximal con- 
tiguous interval of C(w) that contains no elements of C{v). 

• We maintain black-white links between the roots of these skip lists. 

• Each bottom-level skip-list interval that is cut by a skip list of the 
other color has a link to and from the root of that skip list. 




Figure 1: An example of the fractionally-cascaded structures that join a 
"black" C(v) to a "white" C(w). Skip-list edges are shown in bold, with 
those cut by a sublist of the opposite colored gray. The links between skip- 
list roots are shown dashed and the arrowed lines show the links between 
bottom-level skip-list edges and the roots of the opposite-color skip lists that 
cut that edge. 

Searches. A search in a catalog graph G consists of an element x for 
which we would like to find pred„(x) in C(v) for each node v in a given 
path P = (v\, 1)2, ■ ■ ■ , Vk). We assume that we have a complete deterministic 
skip list for the first node, v\, of P. This allows us to locate pred^ (x) in 
O(logn) time, where n is the maximum size of any catalog. For locating x 
in C(vi + i), for i = 1, . . . , k — 1, we start from a pointer to pred^. (x), which 
we will have found inductively. There are two cases at this point: 

• Case 1: x falls inside a maximal skip list in C(vi). In this case, we 
traverse up the skip list for this interval in C(vi) to its root and then 
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follow the pointer from the root to the interval in C(i>i+i) containing 
x. 

• Case 2: x falls outside a maximal skip list in C{vi). In this case, we 
follow the pointer from the "cut" interval in C(vi) containing x to the 
root of the skip list in C(vi+i) falling in this interval. We then search 
down this skip list to locate the predecessor of x in C ) • 

Note that, in either case, each step i of the search, after the first, runs in 
0(\og5 Vi . Vi+1 (x)) time, since the size of the skip list we search in for either 
case is 0(5 VitVi+1 (x)). 

Updates. Let us consider how to perform an update in our structure, that 
is, an insertion or deletion in a C(y) list, assuming we have already located 
the place in C(v) where the update is to occur (let us account separately for 
the time needed to find this location). We perform the necessary updates 
for each edge (v,w), of which there are only a constant number, according 
to the following cases: 

• Insert y: 

— Case 1: y falls inside a maximal skip list L in C{v). In this case, 
we simply insert y in L. 

— Case 2: y falls outside a maximal skip list in C{v). In this case, 
we follow the interval pointer from the (gray) interval in C{v) 
containing y to the skip list L in C(w) and search down for y in 
this list. If y falls in the interior of L then we split L at y, set 
up y as its own skip list in C(v) and update the pointers of the 
three new root nodes. If y falls outside L, then we simply insert y 
in the appropriate predecessor or successor skip list in C(v) and 
update the (gray) interval to now have y as an endpoint. 

• Delete y: 

— Case 1: y falls in a maximal skip list L in C(v) with at least 
one other element. In this case, we simply remove y from L 
(possibly updating boundary pointers if y was the smallest or 
largest element in L or the root pointers, if y was a root element — 
so that the appropriate adjacent pointers now point to the new 
root of L). 

— Case 2: y is the only element of its skip list in C(v). In this case, 
we follow the pointers from y's (root) node to the two skip lists 
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in C(w) that y separates, and we perform a splice of these two 
structures, updating the root pointers as needed. 

Note that in either an insertion or a deletion, the time needed to perform 
all the necessary local searching, insertions, deletions, splits, and/or splices 
is 0(logS v>w (y)). 

Theorem 1 A catalog graph G, with maximum catalog size n, can be aug- 
mented with additional structures so as to support searches for an element 
x in the catalogs in a path P in G in time 0(logn + Yl( v w)eP 1°§ $v,w(x))- 
Likewise, a sequence of updates for an element y in catalogs in a path P in 
G can be done in these structures in time 0(logra + Yl(v w)eP ^°S^v,w(y))- 

Additional Analysis. So as to better motivate the use of relative local 
discrepancy as a performance parameter, we provide in this subsection some 
additional analysis of our dynamic fractional cascading solution. 

Uniform data. Suppose that each catalog in G contains n points chosen 
independently and uniformly at random from the interval [0, 1]. In this case, 
the set of points in a catalog C(y) define a set of order statistics, and the 
distribution of the length of consecutive spacings therefore follows the Beta 
distribution with parameters 1 and n (e.g., see [21 [14]). Thus, the expected 
interval length is l/(n+l). Having fixed such an interval in C(v), the number 
of points in C(w) that falls in this interval follows a Binomial distribution, 
with probability equal to the length of the interval. Thus, the distribution 
of each 6 vw (v) follows the Beta-Binomial distribution, with parameters 1 
and n, which has expected value fi = n/(n + 1) [23] . 

The performance of searching and updating our augmented structures 
at an element x along a path P = (ui, . . . , Vk) in a catalog graph G depends 
on the random variable, 

T P= 1 °g<Wi+i( a; )- 

(vi,v i+1 )eP 

Unfortunately, the relative local discrepancies for consecutive edges in P are 
not necessarily independent. Even so, we can write 

T P = £ log 0*0, (!) 

(vi,v i+1 )eP, odd i (vi,v i+1 )eP, even i 

and we note that each term in the separate sums are independent. Thus, 
we can bound the degree to which Tp differs from its expectation by adding 
bounds on the two sums. Combining this with the expected value of the 
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associated Beta-Binomial distribution given above, we can use a Chernoff 
bound twice (e.g., see [20]) to prove the following (we give the proof in the 
final version): 

Theorem 2 Given a catalog graph G such that each catalog is a set of 0(n) 
independent, uniform random points in the interval [0, 1], then for any path 
P of length k in G, £V w ^ eP log 5 V:W (x) is O(k) with probability 1 — l/2 k . 

Using this result, we can take the dynamic range searching structure of 
Mehlhorn and Naher [18], which is based on range trees (e.g., see [22], and 
replace their dynamic fractional cascading solution with ours, which gives 
us the following: 

Theorem 3 We can maintain a dynamic range searching data structure for 
a set of points taken uniformly at random in the unit cube so as to support 
point insertions and deletions in O(logn) time w.h.p. and the reporting of 
all the points in a rectangular query range [x\,X2[ x [2/1,2/2] in 0(logn + k) 
time w.h.p., where k is the number of points returned by the query. 

Our data structure is deterministic, and works in the standard comparison- 
based pointer-machine model. It therefore matches w.h.p. the range-searching 
query and update times of Mortensen [19], which are instead for the RAM 
model and make use of bit twiddling. The high-probability bound in Theo- 
rem [3] comes from Theorem [2j 

Other Upper bounds. Given the above proof technique derived for the 
uniform data case, we can motivate bounds on Tp for other distributions by 
probabilistically bounding of each of the two terms in Equation [TJ That is, 
let us concentrate on the odd i case (the analysis for even i being similar) 
and let us consider a random variable X = X\ + ■ ■ ■ + X^ such that, for each 
i = l,...,k/2, Pr(log 

fiv2i-i,v 2i ( x ) < y) < Pr(^Q < y)- Then we can bound 
the odd summation with a bound for X . In particular, we can use various 
Chernoff bounds to show that X is 0{k) with high probability for each of 
the following cases: 

• Each Xi is Binomial with constant expected value. 

• Each Xi is geometric (the discrete counterpart to the exponential dis- 
tribution). 

• Each Xi is Poisson with bounded expected value. 

Having provided our general framework for discrepancy-sensitive dy- 
namic fractional cascading, let us give a concrete application to nearest- 
neighbor searching. 



9 



3 Dynamic Dominated Maxima 



This section describes a scheme for dynamically maintaining a set S of points 
drawn from a uniform distribution in a rectangle, so that a dominated max- 
ima query can be done in 0(log n) expected time: Given a query point g, the 
query returns the set of maximal elements among the points of S that are 
dominated by q; note that the expected size of the output is itself O(logn) 
(because of the uniform distribution). The expected time for an update will 
be shown to be 0(log 2 n). 

We shall find it necessary to maintain 4 such data structures, one for each 
of the 4 possible sets of coordinate axes obtained by reversing the direction of 
{neither, one, both} of the x and y axes - having all 4 such structures makes 
it possible to achieve the bounds we claim but imposes only a constant factor 
of 4 on the complexity bounds. 

In order to more explicitly define the 4 above-mentioned problems, and 
also to facilitate the understanding of our algorithm, we will consider the 
smallest origin-centered square containing the whole set S for a given state 
of S. We position four coordinate systems, one at each of the four corners 
of the square, with the origin being at the corresponding corner and the 
directions of the axes pointing from the origin along the edges of the square. 
We call these four coordinate systems South- West (abbreviated as SW), 
South-East (SE), North- West (NW), North-East (NE). For a point g € S, 
we use xsw(q) (resp., ysw(o)) to denote the x (resp., y) coordinate of q 
in the SW coordinate system. A similar notation is used for the other 
three coordinate systems. Such coordinate systems and point coordinates 
are depicted in Fig. ??. 

The 4 problems mentioned above are then the following: (i) A South- 
West problem that pertains to the subset of S that is dominated by the 
query point go in the SW coordinate system, i.e., the subset "below and 
to the left of go" ; (ii) a South-East problem that pertains to the subset of 
S that is dominated by the query point go hi the SE coordinate system 
(the subset "below and to the right of qo"); (hi) a North-East problem that 
pertains to the subset of S that is dominated by the query point go hi the 
NE coordinate system (the subset "above and to the right of go"); and (iv) 
a North- West problem that pertains to the subset of S that is dominated 
by the query point go in the NW coordinate system (the subset "above and 
to the left of g "). 

Recall that a point g is maximal in the set S relative to the SW coor- 
dinate system iff for every other point q' £ S at least one of the following 
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inequalities holds: 

xswW) < x S w(q) ysw{q) < ysw{q), 

which, in words, can be stated as: "no other point of S dominates q in the 
SW coordinate system." For a point q and a set S we also define the notion 
of a maximal set in the SW coordinate system with respect to q. This set, 
denoted by Mgw(S, q), is computed by first considering only those points 
in S that are dominated by q in the SW coordinate system (i.e., the subset 
of S below and to the left of q) and then computing the maximal points of 
that subset. All points in M$w (S, q) are assumed to be sorted by increasing 
x coordinates. A similar notation is used for the other three coordinate 
systems. 

In the rest of our discussion we focus on the South- West problem. All 
of our solutions for this South- West problem can be translated into similar 
ones for the South-East, North-East, and North- West problems. 

3.1 The Data Structure 

Let T x be an n-node search tree structure whose nodes are the n points of S 
ordered by their x coordinates. T x verifies the following properties, v being 
a node of T x : 

• T x is a weight balanced binary search tree 

• All nodes in the right subtree of v have greater x value than v 

• All nodes in the left subtree of v have lesser value than v 

For each node v in T x , we use Sl v to denote the subset of S that lies in 
the subtree of v and have x coordinate lesser or equal to v's one. Each 
such Sl v is itself organized as a dynamic search structure according to the 
y coordinates of the points in it. The T x tree and its associated S7„'s are 
organized as the dynamic fractional cascading structure described above. 
With this structure in place, for every path V in T x , searching for yo in Sl v 
for every v G V can be done in 0(logn + \V\) expected time. 

An update to this structure due to insertion or deletion of a point con- 
sists of adding or removing a node of T x , updating all the Sl v sets from that 
node to the root and finally then rebalancing T x . Note that the insertion of 
a point (xQ,yo) does not cause the creation of a new node in T x if there exists 
already a point with xq coordinates, but only an update in the underlying 
dynamic fractional cascading structure. We have the equivalent property 
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for deletion. Rebalancing the tree implies O(l) rotations. A rotation asso- 
ciated with three node v,v',v" implies the reconstruction of the underlying 
sets Sl v ,Sl' v ,Sl", that is, 0(\SI V \) insertions and deletions in the dynamic 
fractional cascading structure. Since T x is a weight balanced search tree, 
the amortized value of \Sl v \ is logn. Thus an update to this structure takes 
O(logn) amortized time. 

In addition to the above, each copy of a point q in Sl v stores the following: 

• lsw(v,q) = the leftmost (hence, highest) point in Msw(S v , q). 

• fsw{v-,q) = the rightmost (hence, lowest) point in Msw(S v , q). 

• Ise{v,q) = the leftmost (hence, lowest) point in MsE(S v ,q). 

• rsE{v,q) = the rightmost (hence, highest) point in Mse(S v , q). 

• Inw(viQ) = the leftmost (hence, lowest) point in M^w{S v ,q). 

• r Nw{v,q) = the rightmost (hence, highest) point in M^w(S v , q). 

• Ine{v,q) — the leftmost (hence, highest) point in Mne(S v , q). 

• fNE{v,q) = the rightmost (hence, lowest) point in Mne(S v ,q)- 

The above quantities will be shown to facilitate a query, but they also impose 
the burden of dynamically updating them. We need to describe how a query 
is processed, and how to dynamically update all of the above quantities. 

3.2 Processing a Query 

The query processing consists of, given a query point qo, returning the max- 
imal elements of the subset of S dominated by go i n the SW coordinate 
system. (The query point is arbitrary and need not be in S.) 

More formally, to process a query for a point qo with the coordinates 
(xo,yo), we do the following: 

1. First we locate the node which has greatest x value lesser or equal to 
xq in T x , thereby defining a root-to-leaf path V in T x . Let V\ , . . . , Vt 
be (in left to right order) the nodes whose right sibling is on V. We 
henceforth refer to these nodes as the fringe of xq in T x . Note that 
t < logn, and that every point in (Ji=i nas an x coordinate that 
is < xo and that there is no other such points. 

2. Within every Sl Vi , 1 < i < t, let y\ be the largest y coordinate that is 
< 2/o- Computing all the y[s involves locating yo in every Sl Vi . Using 
the dynamic fractional cascading search structure, the computation of 
all the y[s can be done in 0(logn + i) expected time, which is O(logn). 

3. Let Y±, . . . ,Y t be defined inductively as follows: 

(a) Y t = -oo 
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(b) F fe _! = max{Y" fc , y' k } for k = t - 1, t - 2, . . . , 1. 

In words, (A; < t) is the largest y coordinate among the points in 
Ui=fc+i Slv k ■ 

4. Enumerate the points in M$w (S, q)- Before explaining how this enu- 
meration done, we point out that the point of 5 that constitutes the 
South- West solution must belong to Msw(S, q), which is easy to prove 
by contradiction. We also point out that the expected number of 
points in M$w(S,q) is 0(log|5|), hence O(logn). Thus, the O(logn) 
average query performance would be achieved if we could somehow 
enumerate the points of U = Msw(S,q) in time 0(|?7|). We do this 
by first observing that the subset of S from which the maximal points 
are computed consists of the subset of Ui=i ^l Vk having y coordinates 
< yo- Our strategy will be to enumerate, in the order k = 1, . . . , t the 
maximal points of Sl v . that belong to U, call their set Uf~, stopping 
as soon as the about-to-be- enumerated y coordinate drops below Y^. (If 
we did not stop at that point, we would be enumerating points that 
do not belong to U.) This enumeration of is done as follows: 

(a) Let qk be the point with the y coordinate y' k (that is, qk is the 
highest point of Sl Vk whose y coordinate is < yo)- 

(b) While the y coordinate of qk is > Yjj, we (i) include qk as a member 
of Uk, and then (ii) set qk = rsE(v,qk), which is the rightmost 
(hence, highest) point in Mse(S v , qk)- 

Of course, in the above, U is the concatenation of Ui, . . . , Ut- 

5. Since we have not checked the points with y-coordinate equal to yo in 
Msw{S, q), we need to add them to U. This can be done by searching 
for yo in the fringe of xq which takes O(logn) expected time using the 
fractional cascading structure. 

As argued above, the average complexity of the above query processing is 
O(logn). We now turn our attention to the dynamic updates. We begin 
with the case of insertions. 

3.3 Processing an Insertion 

Let go = (%o,yo) be the point being inserted. We already argued that the 
fractional cascading structure can be updated in O(logn) expected time as 
a result of this insertion. The main task we face now is how to update the 
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quantities lsw{v,q), r S w{v,q), lsE(v,q), r S E(v,q), lNw{v,q), r NW (v,q), 
Ine(v,q), and rNE{v,q), for each q = (x,y) G S and each v that is ancestor 
of x in T x . We explain how to update only r$E{v,q) for all v's that are 
ancestors of x in T x ; similar updating can be repeated for each of the seven 
other quantities (relative to their own frame of reference). 

We begin with the updating of the rsE(v,qYs for all points other than 
go (i.e., the points in S — {go})- And we will explain how to compute the 
rsE(v,q ) separately. 

The first step is to compute, as a query that is processed just as in the 
previous section (except that the coordinate system is different), the set U = 
MNE(S,qo), where, as before, the expected size of U is O(logn). The only 
points q of S whose rsE(v,q) may change are in U. For each point q of U, 
we update its (at most logn) rsE{v,q) values. This is done in constant time 
for each value, by checking whether qo can cause an improvement when v is 
ancestor of qo. The total update time for doing this is therefore 0(|£/| log n), 
which is 0(log 2 n) on average. 

To compute the tse(v, qo), we first compute U' = M$e{S, qo) as a query, 
hence in O(logn) expected time. We then walk along the path from xq to 
the root in T x , and at each node v along this path we set rsE(v,qo) equal to 
the highest point of U' that is in Sl v . Note that this whole walk can be done 
in time O(logn) because of monotonicity: The Sl v 's of the nodes on that 
walk to the root monotonically "swallow" U' in left-to-right order (hence, 
by increasing y coordinates). Thus we end up going through U' only once 
(not logn times). 

3.4 Processing a Deletion 

Let go = {%o,yo) De the point being deleted. We already argued that the 
fractional cascading structure can be updated in O(logn) expected time as 
a result of this deletion. Now we need to show how to update the quantities 
hw{v,q), r sw (v,q), l SE {v,q), r S E(v,q), l NW (v,q), r NW (v,q), l NE (v,q), 
and rNE(v,q), for each q = (x,y) € S and each v that is ancestor of xo in 
T x . We explain how to do it for rsE(v,q) for all v that are ancestors of xq 
in T x , all other values are updated similarly (relative to their own frame of 
reference) . 

First, we compute each of the sets U = Mnw(S, qo) and U' = Msw(S, qo) 
as queries (and, hence, in O(logn) expected time). The only points q of S 
whose r$E{v,q) may change as a result of the deletion are in U. Moreover, 
for each such point q whose r$E{v,q) changes, its new rsE(v,q) is either in 
U' or it is the old rsE{v,qo)- The best candidate from U' for each q £ U 
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need not be done in isolation; rather, it can be done for all the points of U 
together. This can be performed in a manner reminiscent of the way two 
sorted lists are merged, by walking simultaneously along U and U' . This 
has to be done only once (not repeated for the r$E(v,q) of every ancestor 
v of xq). On the other hand, the comparison of the old rsE{v,q) with the 
two new candidates, which are the old r$E(v, qo), and the point of U' deter- 
mined during the above-mentioned merge-like procedure, needs to be done 
for every v and q. Hence, the overall time for a deletion is 0(log 2 n) on 
average. 

4 Dynamic Nearest Neighbors in Minkowski Met- 
rics 

Given a nearest-neighbor query for a point qo, in a set S of uniformly- 
distributed points in an axis-aligned rectangle, we partition the problem 
into four sub-problems: (i) a South- West problem that consists of computing 
the nearest neighbor from among the subset of S that is dominated by the 
query point qo in the SW coordinate system, i.e., the subset "below and 
to the left of qo" ; (ii) a South-East problem that consists of computing the 
nearest neighbor from among the subset of S that is dominated by the query 
point qo in the SE coordinate system (the subset "below and to the right 
of qo"); (hi) a North-East problem that consists of computing the nearest 
neighbor from among the subset of S that is dominated by the query point 
qo in the NE coordinate system (the subset "above and to the right of 
qo"); and (iv) a North- West problem that consists of computing the nearest 
neighbor from among the subset of S that is dominated by the query point 
qo in the NW coordinate system (the subset "above and to the left of qo"). 
We solve all of (i)-(iv) and choose, as the solution to the nearest-neighbor 
query, the best from the four answers they return. Our performance bounds 
for this problem therefore immediately follow from those we established in 
the previous section for the dynamic dominated maxima problem: O(logn) 
expected query time, and 0(log 2 n) expected time for an update (insertion 
or deletion). 

5 Experimental Results 

In this section we confront our results holding for evenly distributed sets 
to real data sets. First we evaluate the local discrepancy distribution along 
a path down a search tree T x as we use it in our nearest-neighbor data 
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structure in the case of real and evenly distributed data. Then we show the 
validity of the expected log{n) size of Msw(S, q) in the case of real data. 
Real Data Set. 

To run our experimentations, we have chosen to use real data extracted 
from TigeiQ by Dr. Yufei TadI Originally, the data represented areas in 
Long Beach County (areas that could be seen as RFID readers range). We 
have kept the center of those areas and stored their x and y coordinates. This 
data set is interesting because of his good representation of density variations 
in the different areas of the city, characteristic of human activity. Let denote 
this real data set of cardinal 53145 by S. To avoid under-estimated results 
induced by border effects (querying from points outside the set or on its 
border), we have run the tests on points belonging to the real data set S 
itself and which are located in a restricted inner-area of S of high density. 

Local Discrepancy on a Range Tree. In this section we explore the dis- 
tributions of the local discrepancy in the catalogs of the nodes of a range 
tree, augmented using our dynamic fractional cascading structure. 

Protocol. To evaluate the distributions of the local discrepancy along a 
path in the range tree we use, we have inserted the points of the real data 
set S in such a range tree and chose random query points (x,y). For each 
point, we calculated the local discrepancy relative to y for each edge on the 
path from the leaf associated with x to the root of the tree. We also did the 
same work with the same number of evenly distributed points (see Fig. [2]) . 

Results. 

As we see in Fig. O the distributions of local discrepancy for the real data 
set is very close to the distributions of local discrepancy in the case of evenly 
distributed points. Their plot in logarithmic scale indicates that they are 
very close to exponential distributions, which shows that the demonstration 
for theorem [3] still holds in the case of the real data set. 

Maxima Points Chains Length. 

In this section we show the validity of the expected log{n) size of Msw{S, q) 
in the case of real data. 

Protocol. Our results present an evaluation of the cardinal of M$w(S, q) 
on the real data set S described above. To be able to evaluate \Msw(S, q)\ 
as a function of | •S' | , we ran the tests on S itself and then on S dominated by 
random deletions of points. These random deletions preserved the original 
distribution of S while decreasing its cardinal. Let's denote those dominated 

^U.S. Census Bureau - Topologically Integrated Geographic Encoding and Referencing 
system 

*http:/ / www.cs.cityu.edu.hk/ taoyf/ds.html 
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Figure 2: Distributions of the local discrepancy along top-down path in a 
range tree using real data set in the upper-left corner and evenly distributed 
points on the upper-right corner. The distribution heightk represents the 
distribution of local discrepancy for edges between nodes at height k — 1 
and k containing respectively 2 k ~ 1 and 2 fc points in their catalogs. The two 
plots below show the same distributions on a log scale. 

sets by Sk- The tests were run by querying randomly chosen points q € Sk 
for each dominated subsets Sk- 

We also ran the tests in the same conditions on evenly distributed sets 
Rk of cardinal equal to \Sk\- 

Results. The cardinal of Msw(S, q) is expected to be a 0(log(\S\)), and 
more precisely bounded by /id^l) = 1 + X^L—i \ P3 if the points of S are 
evenly distributed. Fig.[3]shows our experimental results for the sets Sk and 
Rk compared to h(\Sk\) and log(\Sk\)- We see that the assertion holds in 
the case of the real data points we used, which we think representative of 
the type of distribution our system could deal with. 

The two experimental results put together indicate that the complexity 
bounds announced for our dynamic nearest neighbor solution still hold in 
the case of real data set. 



17 




Figure 3: Results for \Msw(£>k)\- The x-axis represents the number of point 
in Sk while the y-axis represents the cardinal of Msw(Sk)- 
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